Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Bat algorithm for high utility itemset mining based on length constraint
Quan YUAN, Chengliang TANG, Yunpeng XU
Journal of Computer Applications    2023, 43 (5): 1473-1480.   DOI: 10.11772/j.issn.1001-9081.2022040622
Abstract250)   HTML3)    PDF (1493KB)(109)       Save

In order to mine the High Utility Itemsets (HUIs) that meet the special needs of users, such as the specified number of items, a Bat Algorithm for High Utility Itemset Mining based on Length Constraint (HUIM-LC-BA) was proposed. By combining the Bat Algorithm (BA) and length constraints, a new High Utility Itemset Mining (HUIM) model was constructed, in which the database was transformed into a bitmap matrix to realize efficient utility calculation and database scanning. Then the search space was reduced by using the Redefined Transaction Weighted Utility (RTWU) strategy. Finally, the lengths of the itemsets were pruned according to the items determined by roulette bet selection method and depth first search. Experiments on four datasets showed that, when the maximum length was 6, the number of patterns mined by HUIM-LC-BA was reduced by 91%, 98%, 99% and 97% respectively compared with that of HUIM-BA (High Utility Itemset Mining-Bat Algorithm) with less running time; and under different length constraints, the running time of HUIM-LC-BA is more stable compared to the FHM+ (Faster High-utility itemset Ming plus) algorithm. Experimental results indicate that HUIM-LC-BA can effectively mine HUIs with length constraints and reduce the number of mined patterns.

Table and Figures | Reference | Related Articles | Metrics
Document-level relation extraction method based on path labels
Quan YUAN, Yunpeng XU, Chengliang TANG
Journal of Computer Applications    2023, 43 (4): 1029-1035.   DOI: 10.11772/j.issn.1001-9081.2022030327
Abstract332)   HTML26)    PDF (1581KB)(194)       Save

Due to the high complexity of text processing in document-level relation extraction, it is difficult to extract efficient entity relations. Therefore, a path label based document-level extraction method was proposed to select key evidence sentences. Firstly, the Path label was introduced to replace the entity sentence as the processed text dataset for data preprocessing. At the same time, combined with the U-Net model of semantic segmentation, the encoding module at the input end was used to capture the context information of the document entity, and the image style was used to capture the context information of the document entities, and the U-Net semantic segmentation module was used to capture the global dependencies among entity triples. Finally, a Softmax function was introduced to decrease the noise of text extraction. Theoretical analysis and simulation results show that compared with the graph neural network-based RoBERTa (Robustly optimized Bidirectional Encoder Representations from Transformers) (RoBERTa?ATLOP) relation extraction algorithm, Path+U-Net has the F1-score in the development and testing of Document-level Relation Extraction Dataset (DocRED) increased by 1.31 and 0.54 percentage points respectively, and the F1-score in development and testing of Chemical Disease Response (CDR) dataset improved by 1.32 and 1.19 percentage points respectively. At the same time, Path+U-Net has lower extraction cost for datasets and higher extraction accuracy of text, while the correlation between entities is consistent with the correlation in the original dataset. Experimental results show that the proposed extraction algorithm based on path labels can effectively improve the extraction efficiency of long texts.

Table and Figures | Reference | Related Articles | Metrics